316 research outputs found
Robust Tuning Datasets for Statistical Machine Translation
We explore the idea of automatically crafting a tuning dataset for
Statistical Machine Translation (SMT) that makes the hyper-parameters of the
SMT system more robust with respect to some specific deficiencies of the
parameter tuning algorithms. This is an under-explored research direction,
which can allow better parameter tuning. In this paper, we achieve this goal by
selecting a subset of the available sentence pairs, which are more suitable for
specific combinations of optimizers, objective functions, and evaluation
measures. We demonstrate the potential of the idea with the pairwise ranking
optimization (PRO) optimizer, which is known to yield too short translations.
We show that the learning problem can be alleviated by tuning on a subset of
the development set, selected based on sentence length. In particular, using
the longest 50% of the tuning sentences, we achieve two-fold tuning speedup,
and improvements in BLEU score that rival those of alternatives, which fix
BLEU+1's smoothing instead.Comment: RANLP-201
Do peers see more in a paper than its authors?
Recent years have shown a gradual shift in the content of biomedical publications that is freely accessible, from titles and abstracts to full text. This has enabled new forms of automatic text analysis and has given rise to some interesting questions: How informative is the abstract compared to the full-text? What important information in the full-text is not present in the abstract? What should a good summary contain that is not already in the abstract? Do authors and peers see an article differently? We answer these questions by comparing the information content of the abstract to that in citances-sentences containing citations to that article. We contrast the important points of an article as judged by its authors versus as seen by peers. Focusing on the area of molecular interactions, we perform manual and automatic analysis, and we find that the set of all citances to a target article not only covers most information (entities, functions, experimental methods, and other biological concepts) found in its abstract, but also contains 20% more concepts. We further present a detailed summary of the differences across information types, and we examine the effects other citations and time have on the content of citances
Flattening the Curve of the COVID-19 Infodemic: These Evaluation Campaigns Can Help!
The World Health Organization acknowledged that “The 2019-nCoV outbreak and response has been accompanied by a massive ‘infodemic’ ... that makes it hard for people to find trustworthy sources and reliable guidance when they need it.” While fighting this infodemic is typically thought of in terms of factuality, the problem is much broader as malicious content includes not only “fake news”, rumors, and conspiracy theories, but also promotion of fake cures, panic, racism, xenophobia, and mistru..
- …